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Whenever there is too much information out there, it is desirable to 
summarize. If humans are trying to create the summary, it will take lot of 
time. Now to make the problem of summarizing information easier and more 
effortless one can automate the summarization process which can reduce the 
time taken in creating summary. This is called as automatic summarization. 
The two ways of summarization are extractive summarization and 
abstractive summarization. Extractive summarization and its applications 
have been the subject of extensive research and have received state of art 
solution. But abstractive summarization still is a progressive field as it is 
difficult to create abstractive summary as humans do. Also, it is still a 
question i.e., how to evaluate the quality of a summary? Therefore, this 
paper is a comprehensive survey on the dataset used with its details and 
statistics, analysis of various abstractive summarization techniques and 
important parameters for evaluating the quality of summary. Deep leaning 
based models have given new direction in this field. The author also focuses 


on problems and challenges faced in the generation of summary which are 
opening the future research scope in this domain. 
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1. INTRODUCTION 

Massive amount of data is being produced and consumed every day, and the credit goes to cheap 
internet facility which gave rise to the excess use of social media. Everyone wants to pull out the important 
information from the large amount of data. As, no one wants to spend much time and efforts in reading lot of 
data and receive less information. So, to reduce the time and efforts it is necessary to consolidate textual 
information and organize the content into a summary. The lengthy data can be shrunk to small data. As, in 
today’s scenario due to the busy schedule no one wants to spend lot of time on reading unnecessary data [1]. 

The objective of summarization is to condense large amounts of information into shorter, more 
concise versions while retaining the most important and relevant content. Summarization aims to provide a 
quick overview or understanding of a text, document, or piece of information, making it easier and faster for 
readers to grasp the main points without having to go through the entire original text. This process can be 
done manually by humans or automatically by algorithms, with the goal of saving time and effort in 
information processing and comprehension. Summarization serves various purposes, such as: i) optimize 
topic coverage: the summary must incorporates all the main topics from the original document, ii) optimize 
readability: the summary must have proper logical flow to understand the summary, iii) optimize coherence: 
summary must have connectivity among word sentences and paragraphs, iv) reduction of redundancy: the 
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summary should not include repetition of sentences, and v) reducing the size of original text: the size of 
original document need to be compressed depending on the type of summary needed by the user. 

Process of automatic text summarizer (ATS): the general ATS system comprises of the following 
steps. Firstly, by acquiring the required data. Passing it to the system where text pre-processing is initial step 
where the raw data is prepared for the next step then the processing is done with the help of certain model 
whichever is chosen according to the summarization approach. Finally, the last step includes the post 
processing in which few problems are addressed such as the reordering of the sentences is done. These steps 
in return gives the output as the summary. Further, categorization of the automatic summarization can be 
done on the basis of different factors. The Table 1 helps to understand the categorization based on various 
parameters clearly which are based on the input elements, approach and output elements. 


Table 1. Different types of summarizations on the basis of various factors 


Factors Description 
On the basis of input element 

Input size: single document The output summary is produced from only one input document. 

Input size: multi document The output summary is produced using many input documents [2]. 

Input language: mono The input text is in single language (i.e., either English, Hindi) out of which summary is 
generated. 

Input language: multi Summarizing documents in multiple languages with the intention of producing the 
summary. 

Input language cross Summaries generated in different language other than the source document [3]. 

Input medium: text The input document comprises of text only. The summary is generated from text. 

Input medium: multimedia The Input document can be of audio or video out of which the summary is produced. 

Purpose: generic Model does not make any assumptions for any domain specific knowledge so as to 


understand which words or phrases are important for the summarizer. It includes normal 
understanding of the language to be used and the generic words. 


Purpose: domain specific The summary to be generated needs a domain specific knowledge so as to understand 
which words or phrases are important for the summarizer. e.g., bio science documents. 
Purpose: query based Summaries can be user focused means these are tailored according to the needs of a 
particular group of users or users who need to have answer for the query [4]. 
On the basis of approach 
Extractive approach During the creation of a summary, extractive methods directly take sentences or phrases 


out of the source or input text [5]. It creates a summary by obtaining the important 
sentences from the input text and does not create its own sentences and hence it is easier 
to achieve. 

Abstractive approach Before using the algorithm of natural language generation (NLG) to create a 
brief summary using paraphrasing, sentence reduction, and substitution of synonyms, 
abstractive systems must first understand the text’s semantics [5]. Therefore, here the 
summary is generated by picking up the important keywords and rephrasing the sentences 
and creating the short sentences out of lengthy ones. This same as we humans do. 

Hybrid approach The new trending approach is hybrid approach. As the name suggest it is the fusion of 
extractive and abstractive approach which has its own advantages [6]. 

On the basis of output 


Summary type: extractive Extraction of only the important information of original text which is concise. Important 
information is extracted from the text. 
Summary type: abstractive Recreating the sentences with important keywords along with new words and phrases 


which are not present in the original summary and also reducing the size of original 
document. Its very similar way that we humans summarize. Human brain creates internal 
semantic representation of text that human read. Then it recreates the sentences with new 
words. 

Summary content: indicative An indicative summary does not include the substance of the input text, but rather 
indicates the basic subject matter or domain of the text [7]. It simply gives an idea of the 
source document whether to read the text or not. The document length is reduced to 5% of 


the input. 
Summary content: This type of summary includes all the important content from the source document [7]. 
informative The document is compressed 20% of the whole text length. 


Research objectives: the goals of the research are to have detail survey on various datasets used for 
the ATS and analyze different techniques of abstractive summarization used so far. And also, understand 
how to assess the quality of produced summary through ATSs. 

Structure of the paper: the manuscript’s organization is: most commonly dataset used for 
summarization are discussed including the size, domain, language, and its type in section 2. Section 3 
overviews the methods that are used for abstractive summarization process. The evaluation metric is 
explained in section 4. The result and analysis of the survey in terms of dataset and methods is in section 5. 
At the end, the complete work and analysis report is concluded in section 6. 
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2. DATASET USED FOR AUTOMATIC TEXT SUMMARIZATION 

As dataset plays a crucial role in building any deep learning model the same is applied to 
summarization models it helps to prepare the model to learn the summary generation task. In our survey it is 
seen that the maximum dataset used for text summarization are related to news articles namely DUC 
2001-2007, text analysis conference (TAC) 2008-2011, New York times (NYT) [8], CNN/daily mail [9], and 
Gigaword. Other datasets are LCTS which is of microblogging data, PubMedCite is of biomedical papers. 
The Table 2 is all about the dataset available for summarization with its statistics of the size, domain, 
language and type. 


Table 2. Comparison of dataset on size, domain, language, and type 


Dataset Dataset size Domain Language Type 
DUC 2001-2003 60x10 News English Single and multi document 
DUC 2004 100x10 News English and Arabic Single and multi document 
DUC 2005 50x32 News English Multi document 
DUC 2006 50x25 News English Multi document 
DUC 2007 25x10 News English Multi document 
TAC 2008 48x20 News English Multi document 
TAC 2009 44x20 News English Multi document 
TAC 2010 46x20 News English Multi document 
TAC 2011 44x20 News English Multi document 
CNN/daily mail 287,226 News English Single document 
Newsroom [10] 995,041 News English Single document 
Multinews News English Multi document 
SummBank 40x10 News English Single and multi document 
CAST 147 News English Single document 
NYT 589,284 News English Single document 
XSUM 226,711 Headline generation English Single document 
EASC 153 News and Wikipedia Arabic Single document 
Gigaword 3,800,000 News English Single document 
LCSTS 2,4000,591 Microblogging English and Chinese Single document 
Opinions 51x100 Reviews English Multi document 
PubMedCite 192 K Biomedical papers English Single document 
S2ORC 81.1 million Acdemic papers English Single document 


3. METHODS FOR ABSTRACTIVE SUMMARIZATION 

Recreating the sentences with important keywords along with new and fresh vocabulary which are 
not present in the original summary. And also reducing the size of original document is what abstractive 
summarization is. There are different techniques and methods used in abstractive summarization which is 
broadly classified into three i.e., structure based, semantic based, and deep learning based as shown in Figure 1. 


Abstractive 
Summarization 
Techniques 


Structure Based Approach 
(Use of prior knowledge 
and psychological feature 
schemas) 


Semantic Based Approach 
(Use of NLP for understanding 
meaning of language) 


Seq2 Seq Models 
(Deep Learning based 
Models) 


|__| Template Based Multi-modal 
——— Method Semantic Method jest RNN 
Lead &Body |_| Information Item- 
Phrase Method Based Method PECINA 
Rule Based Semantic Text Gated RNN 
— Method Representation o— (LSTM, GRU) 
Ontology Based Method Amon a 
pres a tc Semenic Graph Mechanism 
Graph Based SET MANO 
Method Transformers 
Tree Based | | 
Method 


Figure 1. Various abstractive summarization techniques 
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3.1. Structural based approach 

One of the traditional approaches is the structural based approach which presents the summary 
based on the previous knowledge. The fundamental idea behind structure-based solutions is to encode the 
most important data from the input document using previous knowledge, psychosocial feature schemas that 
include flexible alternative structures, and templates with extraction rules including graphs, ontologies, trees, 
lead, and body [11]. Lead and body phrase method: Tanaka et al. [12] proposed this method, which involves 
analyzing the syntactic structure of the lead and body sentences is used in order to summarize broadcast 
news. This technique, which took inspiration from the sentence fusion method, utilizes sentence revision to 
construct a summary by inserting and replacing identical phrases found in the lead and body chunks. The 
advantage of this approach is that it helped find adjustments to a lead sentence that were semantically 
appropriate [13]. While the problem with this approach is parsing mistakes reduce sentential completeness, 
including repetition and grammaticality. 

Template based method: this approach uses a specific guide for describing a whole document. Text 
samples i.e., template that could be mapped into the text snippets are found by comparing language 
conventions or extraction rules (to form a database). These passages of text act as markers for the outline 
[14]. It creates thorough and logical summaries and may be applied to the summarizing of multiple 
documents. The disadvantage is it is necessary to manually design the language conventions, extraction rules 
for the template slots. It is impossible to handle similar information spread over several publications [15]. 

Rule based method: input documents are represented by classes and lists of aspects in rule-based 
approaches. Verbs and nouns with comparable meanings are found in order to generate extraction rules. 
A selection of potential rules is made before being forwarded to the summary creation module. The 
production of outline sentences is done using generation patterns. The strength of this method is its capacity 
to produce summaries with higher information density. The basic weakness of this approach is the 
painstaking and time-consuming hand writing of all the rules and patterns. 

Ontology based method: Lee et al. [16] proposed “fuzzy ontology” method, which is used to 
describe uncertain information in Chinese news summaries, is one of the most important techniques that is an 
example this method. The advantage of this approach is the documents pertaining to a particular domain are 
the major focus. It can easily handle text uncertainty and delivers coherent summaries. It requires a solid 
ontology, but creating one takes a long time because it mainly relies on domain experts to develop the 
ontology. 

Tree based method: when using tree-based approaches, prior to applying these sentence clusters for 
the abstractive summarization, it is required to group related sentences in the source that contain pertinent 
information. Dependency trees, a common tree-based representation made of related words, are created by 
parsers. Lastly, with a method similar to pruning linearization, some of the clusters of sentence are utilized to 
build trees in order to produce summary sentences [17]. Using language generators results in more fluent, 
less redundant summaries, which raises the quality of the created material. The downfall side of this method 
is many important sentences in the text are missed since it does not take context into account. 


3.2. Sematic based approach 

The second most common and traditional approach used for generating the summary is the semantic 
based approach which uses a completely different idea when compared to the structure based approach. The 
basic goal of semantic-based techniques is to identify noun and verb phrase by using linguistics 
representation of the document(s) as input into a NLG system [18]. Multimodal semantic model: the concepts 
that are represented by both images and text in multimodal documents are captured by the multimodal 
semantic model, which also establishes the relationships between these concepts. The process of creating a 
semantic model begins with object based knowledge representation. Representation of concepts is shown as 
nodes, and the connections between them show their relationships. In order to construct a summary, the 
selected concepts are finally turned into sentences. 

Information item based method: instead of creating the abstract from the input file’s phrases, this 
method creates it from the input file’s abstract representation. The smallest cohesive informational unit found 
in a text, the abstract representation is an information item [19]. Framework was put forth at the TAC for 
news multi-document summarization. Subject-verb-object triples are created at the beginning of the 
information item (INIT) retrieval process through the use of a parser to analyze the text’s grammatical 
structure. The majority of INIT do not result in complete sentences, so before creating a text, they must be 
combined into a sentence structure. With this approach, a summary that is concise, cohesive, information- 
rich, and less redundant is produced. But the lexical quality of the produced summary is nevertheless 
diminished by the fact that it excludes a lot of important information. 

Semantic text representation model: instead of focusing on the syntax or structure of the text, this 
technique seeks to analyze the input text utilizing the word meanings semantic role labelling is recommended 
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in order to retrieve the predicate of the argument structure from each of the phrases or sentences. The 
document set is divided into sentences and given document and position numbers. In order to assign the 
position number, the SENNA semantic role labeler application programming interface (API) is used. The 
matrix of similarity is produced by using semantic graph for semantic similarity scores. Following that, the 
predicate structure, similarity measure of semantic, and relationship between the document sets are 
determined using a enhanced graph-based ranking algorithm. In order to reduce or remove repetition for 
summarization, MMR is finally applied. 

Semantic graph based model: by building a semantic graph known as the rich semantic graph 
(RSG). There are three phases to the strategy which are: input documents are initially expressed using a RSG. 
The RSG depicts the noun and verb in the original document as graph nodes, with the edges denoting their 
semantic and topological relationships. The syntactical and semantic links created in pre-processing module 
connect the various sentence concepts. The original graph is then condensed using heuristic methods into a 
smaller graph. Lastly, the reduced linguistics graph is used to produce the abstractive outline. This approach 
results in fewer repetitive and grammatically sound sentences. However, it’s only applicable to single 
document and doesn’t support the multi document [20]. 


3.3. Deep learning based model 

Recurrent neural network (RNN) encoder decoder summarizer: is a Seq2Seq model which can be 
used for any sequential information challenge. To create a text summarizer that produces a brief summary 
from a lengthy list of words in the text’s body, which also functions as a sequence. This can therefore be 
modelled as a many-to-many Seqence2Seqence issue. A Seq2Seq model primarily consists of two elements: 
encoder\decoder. Hence, a deep learning model called as RNN works to analyze data in a sequential manner, 
with each state’s input reliant on the preceding state’s output. As it is unidirectional it only predicts from the 
past [21], [22]. 

Bidirectional RNN: represent a class of recurrent neural network architecture that performs forward 
and backward processing of input data. Because of its bidirectional processing, the network can gather data 
from both past and future contexts, which makes it particularly useful for jobs where comprehension of the 
complete sequence is essential [23]. The detailed description of the operation of a bidirectional RNN can be 
explained as the forward RNN receives the input sequence and proceeds to process it in a left-to-right 
manner. Further it generates a hidden state at each time step, which is a summary of the data up to that point 
in the sequence. The identical input sequence is supplied into the backward RNN simultaneously, and it 
processes the data from right to left. Like the forward RNN, the reverse RNN summarizes data from the right 
time step up to the present time step by creating a sequence of hidden states. At every time step, the 
concealed states from the forward and backward RNNs are combined. Concatenated hidden states provide a 
more complete picture by capturing data from both past and future contexts. But it suffers from the issue of 
vanishing gradient. Also, the computation complexity is increased as compared to the RNN. 

Gated RNN: the two commonly used gated RNN are the long short term memory (LSTM) and gated 
recurrent unit (GRU) [24]. While training a long sequence with an RNN, the issue of vanishing gradients 
arises. This issue is resolved by adopting gated RNNs. 

Attention mechanism: the idea of an attention mechanism is to anticipate a word by concentrating 
exclusively on a small subset of the sequence’s components instead of analyzing the complete sequence. This 
is how we can solve the issue of lengthy sequence [25]. One can emphasize particular words of the source 
sequence that lead to the target sequence rather than focusing on all the words in the source sequence. 

Transformers have advantages over the earlier RNN which used to input token sequentially but the 
transformers process words all at once and are highly parallelizable [26]. The main building block of 
transformer is the self attention mechanism. There are three types; i) encoder decoder, ii) encoder only, and 
iii) decoder only. Examples of encoder only transformer is bidirectional encoder representations from 
transformers (BERT), developed by [27] is used for for a range of activities, including as linguistic inference 
and answering questions. 


4. EVALUATION METRICS 

Every process that gives outcome needs to have certain evaluation metric to judge the quality of the 
outcome. In the same way the summary needs to be evaluated for the quality. But evaluation of automated 
generated summary is a challenging task due to the following reasons: i) summary quality depends on the 
need of the user that means user requirement defines the information to be added in the summary. Two users 
may have different needs; ii) a summarization is the process of compressing the source document while 
keeping the important information intact. Therefore, it needs to be evaluated at different compression rate 
which makes the task more challenging; iii) manual evaluation is a tedious job. So, there is a need of 
effective evaluation measure; and iv) depending on the summary’s goal, the content varies, therefore it is 
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challenging to automatically collect this information [28]. The summary can be evaluated on the basis of 
broadly two categories intrinsic and extrinsic method as shown in Figure 2. 


Evaluation Metric 


Intrinsic Method 


Evaluation on the basis of 
Content 


Figure 2. Various evaluation techniques 


Extrinsic Method 
(Task Based) 


Information 
Retrieval 


Evaluation on the basis of 
Quality 


Text Coherence 
Referential Clarity 


Question - 
Answering 


Text Classification 


Non- Redundancy 
Grammatically 
Correct 


4.1. Intrinsic methods 

In this method human judgement is used to assess summary quality. The intrinsic evaluation rates a 
summary on these parameters i.e. coherence, topic coverage, and information quality [29]. The quality of 
summary can be rated by comparing the summary generated by the human. The two most important criteria 
for evaluating a summary are its quality and its informational value. A summary’s informativeness is 
typically judged by contrast to another summary created by a person, can be called as a reference summary. 

Intrinsic evaluation on the basis of quality: how to know that our summaries are good enough in 
terms of (quality of summary). Evaluation of summary is done on the basis of various parameters [30]. As 
these parameters forms the characteristics of good summary. One can evaluate the summary considering the 
parameters i.e., coverage, informativeness, text coherence, and readability. 

Intrinsic evaluation on the basis of content: recall-oriented understudy for gisting evaluation 
(ROUGE) was developed by [31] which has now become a standard metric to summary evaluation. It 
includes series of ROUGE. Deciding on number of grams one should select whether to calculate it on 
parameters: 

a. Precision = No.of n grams found in reference and model/No of n grams in reference. 
b. Recall = No.of n grams found in reference and model/ No of n grams in model. 
c. F1 score = 2 x ((precision X Recall) + (Precision + Recall)). 

Bilingual evaluation understudy (BLEU) [32] score compares a candidate sentence to one or more 
reference sentences to see how well it resembles the set of reference sentences. It assigns an output score 
ranging from 0 to 1. Metric for evaluation of translation with explicit ordering (METEOR): this metric is 
used to evaluate the machine generated text. It is not used by many but it gives better results with human 
judgement correlation as 0.964 as compared to BLEU as 0.817. 


4.2. Extrinsic method 

This approach uses an assignment-based performance indicator, like the information retrieving task, 
to assess the summary’s quality [33]. These tasks can be text classification, questioning answering and 
information retrieval. In extrinsic method, the usefulness of summary is evaluated in relation to a particular 
application setting, such as relevancy assessment and reading comprehension. 
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5. RESULTS AND DISCUSSION 

Dataset: according to our research we have considered the most widely used datasets for the 
automatic text summarization. Lot of news related datasets are available and work has been done on these. 
All the news and news article related datasets gives better results as the information in this is localized and 
hence easy to summarize. The chart in Figure 3 shows the comparison of statistics of available dataset on the 
basis of different domains which shows that the maximum available datasets are of the news domain which is 
80% and then comes the other domains such as microblogging, reviews, scientific papers, and academic 
papers each of which are 4%. Hence, this shows the challenge of low domain specific resource availability 
which is related to the dataset that the available datasets are generic and not domain specific. Therefore, lot of 
work is done in those generic fields but not in domain specific context such as biomedical and medical 
related field where summarization can prove to be very beneficial. 


Dataset 


E News 
m Microbloging 
E biomedical papers 


E Academic Papers 


E Reviews 


E Scientific papers 


Figure 3. Statistics of summarization dataset on the basis of domain 


Methods: various methods and approaches such as structural based, semantic based [34], and deep 
learning based [35] of abstractive summarization are discussed in our manuscript. These methods are 
analyzed on the basis of their strength and weakness. According to our analysis the method which can be 
chosen for better results are the deep learning based models. The transformers in combination with the 
attention mechanism which can present better summaries as compared to the earlier approaches focusing on 
the problem when lengthy summaries need to be generated. As, the traditional approaches are not capable 
enough to present informative summaries for articles which are lengthy with all its parameters. It is also 
difficult to produce the summaries for articles which are domain specific. 


6. CONCLUSION 

This article surveys relevant scientific literature to provide a review of different abstractive 
summarization methods highlighting its advantages and disadvantages. But still a lot of challenges and issues 
are there in abstractive summarization as it is more difficult to achieve when compared to extractive 
summarization. The deep learning models with attention mechanism can be more fruitful in creating 
summaries for lengthy articles. Our study also includes the dataset used with its details and statistics which 
shows that the maximum available datasets are of the news domain and then comes the other domains such as 
microblogging, reviews, scientific papers, and academic papers. Also, mainly they are generic and not 
domain specific. The research framework also focuses on the parameters involved in evaluation of summary 
including ROUGE, BLEU, and METEOR. As the proper evaluation and state of art solution cannot be 
declare only on the basis of one metrics. Therefore, in future work we can implement an abstractive 
summarization model which can be domain specific such as biomedical or health care domain and the 
summary can be evaluated and compared on more than one metric. 
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